Segmentation by BERT
https://gyazo.com/ce8fa595d5d7f26133dcbee2f5862c43
Default "copy" is fine.
In that case, the circuit would be chopped at the appropriate length with a token string
You can give them a combined bulleted section from Scrapbox, excluding code regions, etc., and train them to split it correctly.
Delete unneeded tokens
There is no suitable data set here.
---
This page is auto-translated from /nishio/BERTによる分節化. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.